我们重新审视有限混合模型中最大似然估计量(MLE)的收敛速率的经典问题。 Wasserstein距离已成为分析这些模型参数估计的标准损耗函数,部分原因是其绕过标签切换的能力并准确地表征了具有消失权重的拟合混合物组件的行为。但是,Wasserstein距离只能捕获其余拟合混合物组件中最坏的案例收敛速率。我们证明,当对数似然函数受到惩罚以阻止消失的混合权重时,可以得出更强大的损失函数以解决Wasserstein距离的这种缺点。这些新的损失功能准确地捕获了拟合混合物组件的收敛速率的异质性,并且我们使用它们在各种混合模型中使用它们来锐化现有的侧重和均匀收敛速率。特别是,这些结果表明,受惩罚MLE的组成部分的子集通常比过去的工作预期的要快得多。我们进一步表明,其中一些结论扩展到了传统的MLE。我们的理论发现得到了一项模拟研究的支持,以说明这些改善的收敛速率。
translated by 谷歌翻译
我们提出了一种统一的技术,用于顺序估计分布之间的凸面分歧,包括内核最大差异等积分概率度量,$ \ varphi $ - 像Kullback-Leibler发散,以及最佳运输成本,例如Wassersein距离的权力。这是通过观察到经验凸起分歧(部分有序)反向半角分离的实现来实现的,而可交换过滤耦合,其具有这些方法的最大不等式。这些技术似乎是对置信度序列和凸分流的现有文献的互补和强大的补充。我们构建一个离线到顺序设备,将各种现有的离线浓度不等式转换为可以连续监测的时间均匀置信序列,在任意停止时间提供有效的测试或置信区间。得到的顺序边界仅在相应的固定时间范围内支付迭代对数价格,保留对问题参数的相同依赖性(如适用的尺寸或字母大小)。这些结果也适用于更一般的凸起功能,如负差分熵,实证过程的高度和V型统计。
translated by 谷歌翻译
We present a novel corpus for French dialect identification comprising 413,522 French text samples collected from public news websites in Belgium, Canada, France and Switzerland. To ensure an accurate estimation of the dialect identification performance of models, we designed the corpus to eliminate potential biases related to topic, writing style, and publication source. More precisely, the training, validation and test splits are collected from different news websites, while searching for different keywords (topics). This leads to a French cross-domain (FreCDo) dialect identification task. We conduct experiments with four competitive baselines, a fine-tuned CamemBERT model, an XGBoost based on fine-tuned CamemBERT features, a Support Vector Machines (SVM) classifier based on fine-tuned CamemBERT features, and an SVM based on word n-grams. Aside from presenting quantitative results, we also make an analysis of the most discriminative features learned by CamemBERT. Our corpus is available at https://github.com/MihaelaGaman/FreCDo.
translated by 谷歌翻译
Can we leverage the audiovisual information already present in video to improve self-supervised representation learning? To answer this question, we study various pretraining architectures and objectives within the masked autoencoding framework, motivated by the success of similar methods in natural language and image understanding. We show that we can achieve significant improvements on audiovisual downstream classification tasks, surpassing the state-of-the-art on VGGSound and AudioSet. Furthermore, we can leverage our audiovisual pretraining scheme for multiple unimodal downstream tasks using a single audiovisual pretrained model. We additionally demonstrate the transferability of our representations, achieving state-of-the-art audiovisual results on Epic Kitchens without pretraining specifically for this dataset.
translated by 谷歌翻译
We propose a very fast frame-level model for anomaly detection in video, which learns to detect anomalies by distilling knowledge from multiple highly accurate object-level teacher models. To improve the fidelity of our student, we distill the low-resolution anomaly maps of the teachers by jointly applying standard and adversarial distillation, introducing an adversarial discriminator for each teacher to distinguish between target and generated anomaly maps. We conduct experiments on three benchmarks (Avenue, ShanghaiTech, UCSD Ped2), showing that our method is over 7 times faster than the fastest competing method, and between 28 and 62 times faster than object-centric models, while obtaining comparable results to recent methods. Our evaluation also indicates that our model achieves the best trade-off between speed and accuracy, due to its previously unheard-of speed of 1480 FPS. In addition, we carry out a comprehensive ablation study to justify our architectural design choices.
translated by 谷歌翻译
Medical image segmentation is an actively studied task in medical imaging, where the precision of the annotations is of utter importance towards accurate diagnosis and treatment. In recent years, the task has been approached with various deep learning systems, among the most popular models being U-Net. In this work, we propose a novel strategy to generate ensembles of different architectures for medical image segmentation, by leveraging the diversity (decorrelation) of the models forming the ensemble. More specifically, we utilize the Dice score among model pairs to estimate the correlation between the outputs of the two models forming each pair. To promote diversity, we select models with low Dice scores among each other. We carry out gastro-intestinal tract image segmentation experiments to compare our diversity-promoting ensemble (DiPE) with another strategy to create ensembles based on selecting the top scoring U-Net models. Our empirical results show that DiPE surpasses both individual models as well as the ensemble creation strategy based on selecting the top scoring models.
translated by 谷歌翻译
在计算机视觉领域,异常检测最近引起了越来越多的关注,这可能是由于其广泛的应用程序从工业生产线上的产品故障检测到视频监视中即将发生的事件检测到在医疗扫描中发现病变。不管域如何,通常将异常检测构架为一级分类任务,其中仅在正常示例上进行学习。整个成功的异常检测方法的家庭基于学习重建掩盖的正常输入(例如贴片,未来帧等),并将重建误差的幅度作为异常水平的指标。与其他基于重建的方法不同,我们提出了一种新颖的自我监督蒙面的卷积变压器块(SSMCTB),该卷积变压器块(SSMCTB)包括基于重建的功能在核心架构层面上。拟议的自我监督块非常灵活,可以在神经网络的任何层上掩盖信息,并与广泛的神经体系结构兼容。在这项工作中,我们扩展了以前的自我监督预测性卷积专注块(SSPCAB),并具有3D掩盖的卷积层,以及用于频道注意的变压器。此外,我们表明我们的块适用于更广泛的任务,在医学图像和热视频中添加异常检测到基于RGB图像和监视视频的先前考虑的任务。我们通过将SSMCTB的普遍性和灵活性整合到多个最先进的神经模型中,以进行异常检测,从而带来了经验结果,可以证实对五个基准的绩效改进:MVTEC AD,BRATS,BRATS,Avenue,Shanghaitech和Thermal和Thermal和Thermal罕见事件。我们在https://github.com/ristea/ssmctb上发布代码和数据作为开源。
translated by 谷歌翻译
DeNoising扩散模型代表了计算机视觉中最新的主题,在生成建模领域表现出了显着的结果。扩散模型是一个基于两个阶段的深层生成模型,一个正向扩散阶段和反向扩散阶段。在正向扩散阶段,通过添加高斯噪声,输入数据在几个步骤中逐渐受到干扰。在反向阶段,模型的任务是通过学习逐步逆转扩散过程来恢复原始输入数据。尽管已知的计算负担,即由于采样过程中涉及的步骤数量,扩散模型对生成样品的质量和多样性得到了广泛赞赏。在这项调查中,我们对视觉中应用的denoising扩散模型的文章进行了全面综述,包括该领域的理论和实际贡献。首先,我们识别并介绍了三个通用扩散建模框架,这些框架基于扩散概率模型,噪声调节得分网络和随机微分方程。我们进一步讨论了扩散模型与其他深层生成模型之间的关系,包括变异自动编码器,生成对抗网络,基于能量的模型,自回归模型和正常流量。然后,我们介绍了计算机视觉中应用的扩散模型的多角度分类。最后,我们说明了扩散模型的当前局限性,并设想了一些有趣的未来研究方向。
translated by 谷歌翻译
血管内操作中的自主机器人有可能安全可靠地浏览循环系统,同时降低对人体错误的敏感性。但是,训练机器人的过程涉及许多挑战,例如由于机器学习算法的效率低下而导致的长期培训持续时间以及导管与血管内幻影之间的相互作用引起的安全问题。物理模拟器已在血管内手术的背景下使用,但通常用于员工培训,通常不符合自主插管目标。此外,大多数当前的模拟器都是封闭消息,它阻碍了安全可靠的自主系统的协作开发。在这项工作中,我们介绍了Cathsim,Cathsim是一种开源模拟环境,可加快用于自主内血管内导航的机器学习算法的开发。我们首先使用最先进的血管内机器人模拟高保真导管和主动脉。然后,我们在模拟环境中提供了导管和主动脉之间实时力传感的能力。我们通过使用两种流行的强化学习算法,近端策略优化(PPO)和软参与者(SAC)在两个主要动脉内执行两个不同的导管插入任务来验证我们的模拟器。实验结果表明,使用我们的开源模拟器,我们可以成功训练增强型学习剂以执行不同的自主插管任务。
translated by 谷歌翻译
最近在文献中引入了用于视频异常检测的自我监督的多任务学习(SSMTL)框架。由于其准确的结果,该方法吸引了许多研究人员的注意。在这项工作中,我们重新审视了自我监督的多任务学习框架,并提出了对原始方法的几个更新。首先,我们研究各种检测方法,例如基于使用光流或背景减法检测高运动区域,因为我们认为当前使用的预训练的Yolov3是次优的,例如从未检测到运动中的对象或来自未知类的对象。其次,我们通过引入多头自发项模块的启发,通过引入多头自我发项模块,使3D卷积骨干链现代化。因此,我们替代地引入了2D和3D卷积视觉变压器(CVT)块。第三,为了进一步改善模型,我们研究了其他自我监督的学习任务,例如通过知识蒸馏来预测细分图,解决拼图拼图,通过知识蒸馏估算身体的姿势,预测掩盖的区域(Inpaining)和对抗性学习具有伪异常。我们进行实验以评估引入变化的性能影响。在找到框架的更有希望的配置后,称为SSMTL ++ V1和SSMTL ++ V2后,我们将初步实验扩展到了更多数据集,表明我们的性能提高在所有数据集中都是一致的。在大多数情况下,我们在大道,上海the夫和Ubnormal上的结果将最新的表现提升到了新的水平。
translated by 谷歌翻译